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Abstract 


In recent years, we have seen scientists attempt to model and explain human dynamics and in particular 
human movement. Many aspects of our complex life are affected by human movement such as disease 
spread and epidemics modeling, city planning, wireless network development, and disaster relief, to name 
a few. Given the myriad of applications it is clear that a complete understanding of how people move in 
space can lead to huge benefits to our society. In most of the recent works, scientists have focused on the 
idea that people movements are biased towards frequently-visited locations. According to them, human 
movement is based on a exploration/exploitation dichotomy in which individuals choose new locations 
(exploration) or return to frequently-visited locations (exploitation). In this work we focus on the concept 
of recency. We propose a model in which exploitation in human movement also considers recently-visited 
locations and not solely frequently-visited locations. We test our hypothesis against different empirical 
data of human mobility and show that our proposed model is able to better explain the human trajectories 
in these datasets. 
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Introduction 

The understanding on the fundamental mechanisms 
governing human mobility is of importance for many 
research fields such as epidemic modeling m m m, 
urban planning (TU B], and traffic engineering [121 
mm- Although individual human trajectories can 
seem unpredictable and intricate for an external ob¬ 
server, human trajectories are , in fact, very pre¬ 
dictable and regular over space 

and time [SIHES- One characteristic of human mo¬ 
tion, largely observed in empirical data, is the fact 
that we have the tendency to spend most of our time 
in just a few locations 0 HU [20l . More generally, 
the distribution of visitations frequencies have been 


observed to be heavy tailed miiii]. 

However, the fundamental mechanisms responsi¬ 
ble for shaping our visitation preferences are still not 
fully understood. The preferential return (PR) mech¬ 
anism, proposed by Song et al. [2Tj, offered an ele¬ 
gant and robust model for the visitation frequency 
distribution. It defines the probability R for return¬ 
ing to a location i as E ex /;, where /,; is the visita¬ 
tion frequency of the location i. It implies that the 
more visits a location receives, the more visits it is 
going to receive in the future, which in different fields 
goes by the names of Matthew effect m, cumulative 
advantage [18] , or preferential attachment 0. 

Although the focus of the PR mechanism—as part 
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of the Exploration and Preferential Return (EPR) 
individual mobility model -was used to reproduce 
some of the scaling properties of human mobility, its 
general principles are grounded on implausible as¬ 
sumptions from the human behavior point of view. 
In the long term, the PR assumption as a property 
of human motion leads to two discrepancies. First, 
in the model, the earlier a location is discovered, the 
more visits it is going to receive. In implies that first 
visited location will most likely also be the most vis¬ 
ited one. Second, if the cumulative advantage indeed 
holds true for human movements, people would not 
change their preferences, which is clearly not true. 

Here we propose that the PR mechanism has to 
simultaneously consider the frequencies of visits and 
the time of these visitations in individual human tra¬ 
jectories. Using mobility data obtained from call 
detail records (CDRs) and location-based check-ins 
produced by thousands of users, we uncover a strong 
tendency of individuals to return to recently-visited 
locations, a behavior similar to the one observed by 
Szell et al. in a virtual world [23]. Moreover, we show 
that such tendency is not conditioned to the previ¬ 
ous visitation frequencies. Last, we introduce a vari¬ 
ation to the EPR model to incorporate the influence 
of recency in individual trajectories. Our approach 
is based on the empirical evidences that the longer 
the time since the last visit to a location, the lower 
is the probability of observing a user at this location 

mm- 

Materials and Methods 

Data 

In this work, we used two mobility datasets: the first 
one (D 1) corresponds to 6 months of anonymized 
mobile-phone traces from a large metropolitan area in 
Brazil. This dataset is composed of 8,898,108 records 
from 30,000 users between January 1 June 30, 2014. 
The second dataset ( D2 ) is composed of 23,736,435 
check-ins from 51,406 Brightkite users in 772,966 dif¬ 
ferent locations!]] Unlike the mobile phone data, lo- 

1 Brightkite was a location-based social networking service 
launched in 2007 and closed in 2011 151151 . 


cations in the Brightkite dataset correspond to the 
actual places where the users checked in—phone data 
locations correspond to the antena tower the phone 
communicates with and hence are approximations of 
the user’s actual location. 

Since our interest here is on the individuals’ tra¬ 
jectories, in this analysis we considered only the data 
that provides information relating to the users’ dis¬ 
placement. Hence, we filtered out repeated observa¬ 
tions in one place, resulting in a time series for each 
individual representing their trajectories over the ob¬ 
served period. For instance, if we assume A, B and 
C are locations and the data shows the a user in the 
locations (in this order) [A. B, B, B, C, C, A, A, A, B] 
the trajectory is considered to be [A, B, C, A, B] be¬ 
cause users remaining in the same location between 
consecutive data points are not considered to have 
“moved”. Furthermore, to reduce the influence of 
co-located antennas (common in densely-populated 
sites), we merged those within less than 10 meters 
apart under the just one id. 

Heterogeneities in human mobility 

The first analysis we performed measures the 
population-level heterogeneities represented by the 
different activity patterns. First we determine the 
number of observed displacements (N) per user dur¬ 
ing the period considered. Notice that it does not 
necessarily represent the actual number of displace¬ 
ments, but rather the number of jumps per user 
captured by the datasets. All the scaling parame¬ 
ters were estimated using the methods described by 
Clauset et al. !l|. 

The p(N) of D 1 and D 2 are better approxi¬ 
mated by truncated power-law distributions, defined 
as p{x) = Cx~ a e~ x ^ T whose parameters where es¬ 
timated using the maximum likelihood method (see 
?? for statistical validation). For D 1, the exponents 
were found to be odi ~ 1-0 and td i ss 783 obser¬ 
vations whereas for D 2 the exponent are an -2 ~ 1-3 
and td -2 ~ 923 observations (see Figure [TJ. This 
means that in both datasets, users tend to not move 
a lot, and highly mobile individuals are very rare. 
For instance, in D 1, the daily average number of dis¬ 
placements is approximately 2.2 whereas in D 2 it was 
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approximately 1.7. The average number of jumps per 
month in D 1 is 24.5 while in D2 it was 9.2. The lower 
average number of movements in D 2 could be because 
Brighkite was a location-based social networking ser¬ 
vice, hence, movements related to social activities 
must be overrepresented in it. Nevertheless, given 
our focus is on individuals’ visitation preferences — 
rather than needs —it does not affect negatively our 
analysis. 



N [observations] 

Figure 1: The Number of observed displace¬ 
ments per user. The probability density function 
of the number of observed displacements during the 
observational period. 

From the human mobility perspective, we ex¬ 
tracted the number of distinct locations users have 
visited in the period (Figure [2]). It depicts the proba¬ 
bility p(S) of a user having visited S distinct locations 
at the end of the observational period. For D 1, the 
number of visited locations is better fitted by a log¬ 
normal distribution with parameters poi ~ 3.16 and 
a D1 k, 0.73 while D 2 follows a truncated power law 
whose exponents are aD 2 ~ 1-22 and Tm ~ 200.0. 
When we look at the CCDF (Complementary Cumu¬ 
lative Distribution Function) in linear scale (inset of 
Figure [2]) it becomes even more evident the fact that 
we spend most of our time in a very few locations. To 
illustrate, about 30 % of the time, users in D 1 were 
found at just 2 locations while in D 2 this number was 
approximately 40%. 



S [locations] 

Figure 2: Number of distinct visited locations. 

The probability density function of the number of 
unique visited locations aggregated by users. Solid 
lines correspond to the best fits. The inset is the 
CCDF of the distribution in linear scale, illustrating 
the fact that people tend to concentrate most of their 
visits to just a few locations. 


Temporal patterns 

In a modern society, where most of the people have 
daily routines, part of our trajectories are constrained 
to a limited number of locations at regular time inter¬ 
vals. Human activity routines are responsible for part 
of the regularities manifested in human movements. 
From the empirical data, we extracted the time inter¬ 
val (in hours) between two consecutive visits to a lo¬ 
cation. The distribution of time intervals is depicted 
in Figure [3] The plot reveals two important features 
of human movements: first, one can observe existence 
of peaks in 24h intervals representing the users’ daily 
routines [8j. Additionally, we can see the presence 
of weekly repetitive patterns as previously observed 
[26l . More formally, the probability of returning to 
a location decreases with p( A t ) oc Af !j e~ At ^ K with 
Pm ~ 1.405 and Km ~ 2,189 hours and pm ~ 1.425 
with Km ~ 6, 791 hours. The second one is the fact 
that both distributions exhibited very similar power- 
law exponents /3, even though the two datasets are 
very different in terms of coverage, spatial resolu- 
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tion, acquisition method and time span. It suggests 
that the temporal dimension of return mobility move¬ 
ments are scale invariant, supporting the general na¬ 
ture of our findings. 




Figure 3: Return probabilities as a function 
of the elapsed time At since the most recent 
visit. Peaks are observed at 24h intervals, capturing 
the temporal regularity of which humans return to 
previously visited locations. Also, it is possible to 
see that the return probability decays very quickly 
as the time increases. Solid lines correspond to the 
truncated power-law fits with exponents Pdi ~ 1.405 
and Pd 2 ~ 1.425. 


Results 

A rank-based analysis of human visita¬ 
tion patterns 

In this work, we propose a rank-based approach to 
the analysis of human trajectories. For such, we de¬ 
fined two rank variables, namely the frequency rank 
(Kf) and the recency rank {K s ). Both ranks were 
measured in a expanding basis from the accumulated 
sub-trajectories. To illustrate, consider a particu¬ 
lar user x with a trajectory T = [{h,l 2 , • • •, l n ), h G 
[1,..., N]\ composed of n steps to S < N locations. 
For each step j > 0, we have the partial trajec¬ 
tory T = [h,h, ■ ■ ■ Jj-i] composed of all the pre¬ 
vious steps, with lj_i being the immediate preced¬ 
ing step. From the sub-trajectory T we compute the 
frequency-based ranks Kf of all locations visited so 
far. If the step lj is a return (i.e. , lj £ T) we say 
that the frequency rank of the location lj is the rank 
k f {j) = K f [lj\. 


As previously described, the PR mechanism sug¬ 
gests that the visitation probability of a particular 
location is proportional to the number of previous 
visits to it {Kf). Our claim is that the Zipf’s Law 
observed in visitation frequencies distribution is in¬ 
fluenced by our tendency to return to recently visited 
locations {K s ). 

To test such influence we compared the return 
probabilities from two ranking approaches: one based 
on the visitation frequencies {Kf) and the other 
based on the recency of the last visit to a location 

{K s ). 

In summary, the two ranks can be described as: 

• K s is the recency-based rank. A location with 
K s = 1 at time t means that it was the previous 
visited location. K s = 2 means that such loca¬ 
tion was the second-most-recent location visited 
up to time t and so on. 

• K f is the frequency-based rank. A location with 
Kf = 1 at time t means that it was the most 
visited location up to that point in time. Simi¬ 
larly, a location with Kf = 2 is the second-most- 
visited location up to time t. 

Given the definitions above, we first analyzed the 
probability of return as a function of K s . This anal¬ 
ysis shows that such probability decays vary rapidly 
with K s (Figured]). More precisely, for Dl, the prob¬ 
ability p{K s ) follows a truncated power law, with 
exponent otK B ~ 1.644 whereas the best fit for the 
frequency-based rank distribution is achieved when 
ctK f ~ 1.86. For D 2, the best fit for the return 
ranks distribution in D 2 is obtained with parame¬ 
ters olk b ~ 1.699 for the recency rank, whereas the 
frequency rank has the exponent q.k s ~ 1.625 (see ?? 
for details on the curve fitting methods and results). 
Similarly to what was observed on the distributions 
of inter-return times p{ A t ), the scaling exponents ob¬ 
served in the recency rank distribution are very sim¬ 
ilar. It suggests that the recency rank may capture a 
fundamental mechanism underlying the return move¬ 
ments. 
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Figure 4: Comparison between the probability of return by recency and frequency ranks. The 

distributions of both ranks can be better approximated by truncated power laws (dashed lines), (a) The 
recency-based rank of D 1 has exponents otK s ~ 1.644 and exponential cut-off tk b ~ 41.66 , whereas the 
frequency-based rank distribution has a better fit for axj ~ 1-86 with TK f ~ 37. (b) The best fit for the 
return ranks distribution in D 2 is achieved with parameters cxk s ~ 1.699 and tk b ~ 250 for the recency 
rank whereas the frequency rank has parameters aK f ~ 1.625 and tk s ~ 125 


Recency over frequency: the role of re¬ 
cent events in human mobility 

In this section we explore the two-dimensional den¬ 
sity distribution of returns p(Kf,K s ). The idea is 
to investigate the return probabilities as an outcome 
of the convolution between visitation frequencies and 
times, encoded in Kf and K s simultaneously. If users 
have a stronger preference for recently visited loca¬ 
tions we should observe: 

1. lower values of K s must be frequently observed 
over a wider range of Kf. It would suggest that 
we tend to return to recently visited locations 
even if we have not visited such location many 
times before ( i.e. , lower Kf rank); 

2. higher values of Kf must deviate from lower Kf 
values, suggesting that the probability of return 
to a location decays with time, even if it was a 
highly visited location. 

To test these hypotheses, we analyzed the fre¬ 
quency of returns with ranks (Kf, K s ) for all Kf and 
I\ s values. For example, a visit to a location with 


ranks (10,3) means a return to the 10 th most visited 
site after visiting 3 other locations. This return distri¬ 
bution is represented as a two-dimensional histogram 
(shown as heatmaps) for each of the datasets (Fig. [5]). 
From the heatmaps, we can observe that returns to 
the most visited locations ( e.g ., Kf < 7) have shorter 
return trajectories. In other words, when it comes to 
our most visited locations, we tend to return to them 
after visiting very few locations. It can be seen by 
the rapid decrease in the returns frequencies when 
K s grows. For instance, in Dl, more than 86% of the 
returns to the most visited location occurred after 
visiting less than five other locations while for D 2, it 
was more than 91% (see Figured]). 

We can observe also that the recency increases the 
probability of return to less visited locations (e.g., 
7 < Kf < 40), expressed by a broader distribution of 
Kf when K s is low (e.g., K s < 3). For instance, a 
closer look at the bottom rows of the plots in Figure 
[5] (in the detail) shows that a recent visit to a location 
can increase the probability of returning to it up to 
10 times in D2 (see Figure ]5 }d). 
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When we compare D 1 and D 2 we can observe a 
slightly different pattern between them. First, the 
effect of recency is much stronger in D 2 than in D 1. 
Such difference could be explained by fact that the 
mobility data of Dl is coarse-grained to a cell-tower 
level. D2 , on the other hand, provides finer-grained 
mobility data, capturing changes in visitation prefer¬ 
ences, independent on the distance between the loca¬ 
tions. 

Additionally, in order to verify whether the power 
law observed in the recency rank distribution is 
rooted on the temporal semantics of individuals’ tra¬ 
jectories, we applied our rank-based approach to ran¬ 
domized versions of both empirical datasets (D 1 and 
D 2). The first randomized dataset we analyzed (i?l) 
was obtained from uniformly shuffling each individ¬ 
ual trajectory. This way, we artificially remove any 
temporal information possibly encoded within the in¬ 
dividual trajectories, while maintaining the visita¬ 
tion frequencies intact. On the second randomization 
method (i?2), we also remove the visitation frequen¬ 
cies by generating for each user a new random tra¬ 
jectory with the same number of displacements, and 
the same number of distinct visited locations. To 
serve as the baseline for the analyses, the data of the 
third randomization approach (i?3) produces a new 
dataset with the same size as the original one, but 
keeping only the total number of users and locations. 
More precisely, for each of the datasets, we generated 
a randomized version of them with M random points 

Vm — [^m ; ^ £ [1 > * • * j AT], 

where each u m , l m is uniformly sampled from U users 
and N locations respectively, with M, U and L the 
same as in £>1 and D 2. 

The first feature we can observe is that when we 
shuffle the trajectories in D 1 (Figure [7Ji), the ranks 
distribution exhibit a similar pattern as observed on 
the original data. However, it supports our claim that 
the predominance of the preferential return, as cap¬ 
tured by the aggregated mobile phone data of Dl, 
is hindering the micro-level dynamics characteristic 
of the recency effect. A closer look at the bottom 
rows of Figure [7jr does not show any increased prob¬ 
ability due to recency. When we artificially destroy 


the power-law distribution of the visitation frequen¬ 
cies (Figure [TJd) we can observe a dramatic change in 
the ranks distribution. It suggests that a significant 
part of the ranks distribution of Dl is indeed rooted 
on the visitation frequencies, as predicted by the PR 
mechanism. 

When we analyze the randomized versions of D 2 
the influence of the recency becomes even more evi¬ 
dent. As before, shuffling the individuals trajectories 
(Figure [7}l) removes the features we described in Fig¬ 
ure [5] (as before the evidence in the bottom rows are 
not there). Moreover, by removing the temporal in¬ 
formation from visitation sequences in D 2, the rank 
distributions acquire the same form as the one of Dl. 

When we look at the recency rank distributions for 
the randomized data (Figure 0, we see that the re¬ 
cency rank on the shuffled trajectories deviate from 
the empirical data, showing that the recency effect is 
indeed present in both datasets. More striking, how¬ 
ever, is the fact that this analysis not only shows that 
the recency effect is limited to the most recently vis¬ 
ited locations but also suggests a possible existence 
of an upper limit to the effect. More precisely, the 
recency effect could be stronger observed when re¬ 
turns occur after visiting 2 locations in Dl and 3 
locations in D 2. It means that if an individual is ob¬ 
served again in a recently discovered location, right 
after visiting less than 3 other locations, it is very 
likely that this location will become a frequently vis¬ 
ited locations. 

In summary, our approach has shown strong evi¬ 
dence that returns in human trajectories are shaped 
by two distinct ingredients, one responsible for the 
long-term regularities (such as the PR mechanism) 
and another one to account for the changes in the vis¬ 
itation preferences, where recently visited locations 
also become highly visited location. 

The Recency-based model 

To test to what extent the patterns we observed in 
the rank distribution corresponds to an unforeseen 
mechanism of human mobility, we tested for the hy¬ 
pothesis that it emerges from the data when we build 
the sequence-based ranks of frequency-driven trajec¬ 
tories. For the argument to hold true, the same pat- 
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Figure 5: Return probabilities. Each point represents a return, whereas the color encodes the density 
of points. The ranks here were shifted to have the highest-ranked locations at (0,0). A point (x,y) in 
the histogram represents a return to the (x + l) th most visited location after y + 1 steps, (a) Looking at 
the return ranks distribution for HI we can observe that the recency influence is less pronounced in D 1 
in comparison with D 2. (b) On the other hand, the fine-grained data of D 2 shows a strong influence of 
recency. 


terns must be observed in the synthetic data pro¬ 
duced by the EPR model. To test our hypothesis, we 
compared the purely frequentist mechanism of the 
EPR against our new human mobility model where 
returns have a bias toward recently-visited locations. 

The recency-based model extends the preferencial 
return mechanism endowing it with a mechanism ca¬ 
pable of capturing the visitation bias towards recently 
visited locations- all ingredients of the EPR model 
were kept intact except for the temporal dimension. 
The reason for that is because the waiting-time dis¬ 
tribution of the EPR model determines only when an 


individual is going to move (i.e. , how much time he 
will wait still before the next jump) but not where he 
goes. It is important to emphasize that the recency 
bias underlying our model is regarding the visitation 
path and it is time-independent. 

The model can be described as follows: first, a 
population of N agents is initialized and scattered 
randomly over a discrete lattice with M x M cells, 
each one representing a possible location. The initial 
position of each agent is accounted as its first visit. 
At each time step agents can either visit a new lo¬ 
cation if probability p new = pS 7 where p = 0.6 and 
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D1 a 



Number of visited locations [L] 


D2 b 



Number of visited locations [L] 


Figure 6: Fraction of returns to the Kf most visited location occurring after the visitation of L 
different locations. Another way to look at the recency effect is by analyzing the correlation between the 
number of different visited locations between two visits to a location. We can see that people tend to return 
to their most visited locations after visiting very few places, (a) In D 1, more than 86% of the returns to 
the most visited location occurred after visiting less than five other locations while for D 2 (b) it was more 
than 91%. 


7 = 0.6 are control parameters—whose values were 
derived by Song et al. ]2l! from empirical data— 
and S corresponds to the number of distinct loca¬ 
tions visited thus far. With complementary proba¬ 
bility 1 — p„.ew an agent return to a previously vis¬ 
ited location. If the movement is selected to be a 
return, with probability 1 — a the * th last visited lo¬ 
cation is selected from a Zipfian distribution (Zipf’s 
law) with probability p(i) oc k s {i)~ v where k s (i) is 
the recency-based rank of the location i. The pa¬ 
rameter r] controls the number of previously visited 
locations a user would remember when deciding to 
visit a location. With probability a the destination 
is selected based on the visitation frequencies with 
probability II, oc &/(z) -1-7 where kf{i) is the fre¬ 
quency rank of location i. Notice that when a = 1 
we recover the original preferential return behavior of 
the EPR model while when a = 0, visitation returns 
will be based solely on the recency. We experimen¬ 
tally tested different parameters configuration for the 
model. Our analyses have shown that when a = 0, 
the heavy tail of the visitation frequency disappears 


while for a = 1 the power law of the recency distribu¬ 
tion vanishes. It suggests that both mechanisms must 
be present in order to reproduce those two features. 
In practice different individuals could have different 
a values. However, extracting it from the empirical 
data is not an easy task once it is hard to determine 
either the movement was driven by the recency or 
frequency. Nevertheless, we determined that a = 0.1 
( i.e. , 10% of the movements influenced by the visi¬ 
tation frequencies) was enough to restore the recency 
and frequency ranks distributions. Also, for the Zip¬ 
fian distribution of the recency rank we used = 1.6, 
extracted from the empirical data. 

Visually, the synthetic data produced by the EPR 
model seems to have a good approximation with the 
empirical data (see Figure m- However, when we 
compare the bottom-most rows of the histogram, it 
deviates from the empirical evidence, by not captur¬ 
ing the broader distribution of p(kf,k s ) for recently 
visited locations. On the other hand, the recency- 
based mechanism (RM) reproduced the recency influ¬ 
ence as observed in the empirical data (Figure [TO] b). 








Figure 7: The rank-based analyses of randomized versions of the empirical datasets. When we 
compare the ranks obtained from the R1 randomization of D 1 and D 2 (a,d), we can observe that both 
distributions are very similar, sharing many common features whereas from the second randomization (b,e) 
we can see that both D 1 (b) and D 2 (e) have very different shapes in comparison with the empirical data 
as depicted in Figure 0 The data from the third randomization method (c,f) totally deviates from the 
empirical data, as well as the other randomization results. It suggests that the patterns observed after the 
analyses of the visitation ranks distributions are indeed rooted on the way humans move. 


When we look at the Kf distribution, the EPR model 
recovers its heavy tail, as one would expect (Figure 
USd). On the other hand, when we look at each vari¬ 
able individually we notice that the K s distribution 
as produced by the EPR model deviates from a power 
law. In fact, it is better approximated by an exponen¬ 
tial distribution whereas recency-model maintains its 
power-law behavior. The differences in the K s distri¬ 
bution as produced by both models become more ev¬ 
ident in log-linear scale (inset of Figure [TQ] d) , where 
we can clearly see that the EPR model does not cap¬ 
ture the preference for recently visited location. 


Discussion 

When we look at an individual’s trajectories over, 
let us say, one year, the visitation patterns and reg¬ 
ularities become very evident and radical changes 
in visitation patterns—such as during a long vaca¬ 
tion abroad or after starting a new job in another 
city—are very unlikely. In a large population, these 
events indeed occur, but their effect on the popula¬ 
tion scale are very diluted and, sometimes, transient. 
Within such limited time window, individuals indeed 
are predictable, and believing that one is going to be 
at one of its most visited locations is a reasonable 
guess. However, it is really unlikely that the individ¬ 
ual’s preferences are the same for 10 or 20 years. A 
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Figure 8: Comparison of the distribution of the recency-based ranks generated after different 
randomizations of the original datasets D 1 (a) and D 2 (b). The plots depict the PDF of the K s 

in log-linear scale. We can observe that when we shuffle the individuals trajectories (Al), removing its 
temporal information—and hence any temporal effect such as recency—the upper part of the curves deviate 
from the empirical data. When we remove the visitation frequency distribution (A2), the tail of the recency 
distribution also is destroyed, approximating to what we observe on the baseline curve (A3) where none of 
the original distributions is maintained. 



Frequency Recency 

p(i) oc fc/(j) -1-7 p(i) oc k s {i)~ n 

Figure 9: Recency-based individual mobility 

model. Notice that the exploration mechanism is 
kept the same as in the EPR model. In addition 
to the PR mechanism, the proposed model incorpo¬ 
rates the recency effect, where recently-visited loca¬ 
tions have also a high visitation probability. 


recently-discovered quality restaurant is a more plau¬ 
sible destination than our former workplace. Some 
events in our lives have the potential to reshape not 
only our visitation patterns but also our preferences. 
In this work we explored this idea under a simple 
rank-based framework. We unveiled empirical evi¬ 
dences supporting the idea that human trajectories 
are biased towards recently visited locations. We also 
offered a different perspective for human mobility in¬ 
vestigation, where the temporal dimension plays a 
role much more important than the inter-event times. 
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Figure 10: Comparison between the EPR model and the recency-based (RM) model, (a) The 

analysis of the return ranks generated by the EPR model shows that it reproduces a pattern similar to the 
one observed from the empirical analysis, especially of Dl. (b) On the other hand, on the presence of the 
recency mechanism, we can observe the same high probability of return to recently visited locations ( i.e . , 
low K s ) as observed on the empirical data, (c) When we look at the distribution of the frequency ranks, the 
preferencial return mechanisms (red diamonds) successfully exhibited a power-law distribution, in agreement 
with the empirical observations. The activation of the recency mechanism does not affect the frequency rank 
distribution (purple hexagons), (d) However, when we look at the K s distribution, the EPR mechanism 
does not capture the power-law behavior observed on the empirical data. 
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Appendices 


A Statistical analysis of the 
rank distributions 


As presented in the main text, the probability dis¬ 
tributions of the rank variables Kf and K s are very 
similar. In order to assess whether the two rank vari¬ 
ables come from the same distribution, we performed 
the two-sided Kolmogorov-Smirnov test. To test our 
hypothesis (i.e., both variables come from the same 
distribution), we compare the Kolmogorov-Smirnov 
distance Dkj,k b with the critical value D a for a de¬ 
sired a level where 


Da = c(a) 


n\ + ni 
niri2 


with sample size n\ = ri 2 = n and c(a) = 1.95 for 
a = 0.001 [U 15]. As one can see from the Table 
[Ml the distance Dk { ,k b > D a for both datasets, 
rejecting the hypothesis that Kf and K s were drawn 
from the same distribution. 


Table Al: Two-sided Kolmogorov-Smirnov test and 
p -values for the rank variables. 


Dataset 

n 

Dk s ,k b 

D a 

p -value 

Dl 

564228 

0.07999 

0.00367 

0.0 

D2 

2267116 

0.09708 

0.00183 

0.0 


Additionally, in order to determine the probability 
function that better characterizes the rank variables, 
we compared the goodness of fit offered by different 
heavy-tailed distributions. For this test, we measured 
the fit provided by two other distributions, namely 
log-normal and double-Pareto log-normal (dPIN). In 
some scenarios, the log-normal and the truncated 
power law distributions can both yield very similar 
results. 

More recently the dPIN distribution has been re¬ 
ported to offer a very sound model for many empirical 
data such as income distribution, oil-field sizes [3] and 
the degree distribution of social networks [2], The 
double-Pareto log-normal corresponds to a mixture 


of two power laws joined by a log-normal segment 
[3]. The PDF of the dPIN can be defined as 


f(x) = [fi(x) + h{x)) 
a + p 


( 1 ) 


fi{x) = a; _a_1 e aw+ “ 2r2 / 2 <I> 


In x — v — 


h(x) = ' 

where $ is the CDF (Cumulative Distribution Func¬ 
tion) of the standard normal N( 0,1) and <E> C is the 
CCDF of 7V(0,1). 






Figure Al: Curve fits of different heavy-tailed dis¬ 
tributions for both Kf (top charts) and K s (bottom 
charts). In addition to the well-known log-normal 
(dot-dashed line) and truncated power law (solid line) 
distributions, we also measured the goodness of fit for 
the double-Pareto log-normal. 

The log-likelihood ratio test compares two compet¬ 
ing candidate distributions where the one with the 
higher likelihood is the one that provides the bet¬ 
ter fit. The sign of the log-likelihood ratio indicates 
the prevalence of the target distribution (here, the 
truncated power law) over an alternative competing 
hypotheses whereas the p -value indicates the signifi¬ 
cance level of the test [I] . As shown on the Tabic IA21 
both rank variables are indeed better approximated 
by truncated power laws. 
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Table A2: Comparison between the goodness of fit provided by the truncated power law and other distribu¬ 
tions via log-likelihood ratio test. 


Dataset 

Rank 

Alternative distribution 

Log-likelihood ratio 

p -value 

Dl 

frequency 

Log-normal 

Double-Pareto Log-normal 

50.195 

157.185 

0.0 

0.0 

recency 

Log-normal 

Double-Pareto log-normal 

46.147 

68.077 

0.0 

0.0 

D2 

frequency 

Log-normal 

Double-Pareto log-normal 

114.455 

90.703 

0.0 

0.0 

recency 

Log-normal 

Double-Pareto log-normal 

45.58 

128.884 

0.0 

0.0 


B Parameters estimation 

The probability distribution of the rank-based vari¬ 
ables described in the main text were better approx¬ 
imated by truncated power-law distributions p(x) = 
Cx~ a e~ x / T . Parameters were estimated using the 
methods in Ref. [T|. 

Table B3: Estimated parameters of the truncated 
power-law distributions with the best fit for the rank 
variables._ 


Dataset 

Rank 

a 

T 

Dl 

recency 

1.644 

41.66 

frequency 

1.859 

37.0 

D2 

recency 

1.699 

250.0 

frequency 

1.625 

125.0 


[3] William J. Reed and Murray Jorgensen. The Dou¬ 
ble Pareto-Lognormal DistributionA New Para¬ 
metric Model for Size Distributions. Commu¬ 
nications in Statistics - Theory and Methods , 
33(8):1733-1753, 2004. 

[4] N. Smirnov. Table for estimating the goodness of 
fit of empirical distributions. Ann. Math. Statist., 
19(2):279-281, 06 1948. 

[5] Nikolai V Smirnov. On the estimation of the dis¬ 
crepancy between empirical curves of distribution 
for two independent samples. Bull. Math. Univ. 
Moscou, 2(2), 1939. 
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